Overview

Dataset Statistics

Number of Variables 23
Number of Rows 3817
Missing Cells 5859
Missing Cells (%) 6.7%
Duplicate Rows 4
Duplicate Rows (%) 0.1%
Total Size in Memory 2.3 MB
Average Row Size in Memory 630.7 B
Variable Types
  • Categorical: 13
  • Numerical: 10

Dataset Insights

super_builtup_area has 1923 (50.38%) missing values Missing
builtup_area has 2018 (52.87%) missing values Missing
carpet_area has 1896 (49.67%) missing values Missing
price is skewed Skewed
price_per_sqft is skewed Skewed
area is skewed Skewed
bedRoom is skewed Skewed
bathroom is skewed Skewed
floorNum is skewed Skewed
super_builtup_area is skewed Skewed
builtup_area is skewed Skewed
carpet_area is skewed Skewed
luxury_score is skewed Skewed
society has a high cardinality: 723 distinct values High Cardinality
sector has a high cardinality: 189 distinct values High Cardinality
areaWithType has a high cardinality: 2436 distinct values High Cardinality
study room has constant length 1 Constant Length
servant room has constant length 1 Constant Length
store room has constant length 1 Constant Length
pooja room has constant length 1 Constant Length
others has constant length 1 Constant Length
furnishing_type has constant length 1 Constant Length
luxury_score has 530 (13.89%) zeros Zeros
  • 1
  • 2
  • 3

Variables


property_type

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 264318
  • The largest value (flat) is over 3.04 times larger than the second largest value (house)

Length

Mean 4.2476
Standard Deviation 0.4317
Median 4
Minimum 4
Maximum 5

Sample

1st row flat
2nd row flat
3rd row flat
4th row flat
5th row flat

Letter

Count 16213
Lowercase Letter 16213
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (flat, house) take over 50.0%
  • The largest value (flat) is over 3.04 times larger than the second largest value (house)

society

categorical

Approximate Distinct Count 723
Approximate Unique (%) 18.9%
Missing 1
Missing (%) 0.0%
Memory Size 312038
  • The largest value (independent) is over 7.49 times larger than the second largest value (tulip violet)

Length

Mean 16.771
Standard Deviation 6.4163
Median 15
Minimum 1
Maximum 49

Sample

1st row dlf regal gardens
2nd row tulip violet
3rd row experion the heart...
4th row signature global c...
5th row puri diplomatic gr...

Letter

Count 57314
Lowercase Letter 57314
Space Separator 6110
Uppercase Letter 0
Dash Punctuation 11
Decimal Number 548
  • The largest value (independent) is over 2.55 times larger than the second largest value (dlf)

sector

categorical

Approximate Distinct Count 189
Approximate Unique (%) 5.0%
Missing 0
Missing (%) 0.0%
Memory Size 285256

Length

Mean 9.733
Standard Deviation 2.3345
Median 9
Minimum 6
Maximum 41

Sample

1st row sector 90
2nd row sector 69
3rd row sector 108
4th row sector 92
5th row sector 111

Letter

Count 25423
Lowercase Letter 25413
Space Separator 4111
Uppercase Letter 10
Dash Punctuation 61
Decimal Number 7549
  • The largest value (sector) is over 14.52 times larger than the second largest value (road)

price

numerical

Approximate Distinct Count 479
Approximate Unique (%) 12.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 61072
Mean 2.5019
Minimum 0.07
Maximum 31.5
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • price is skewed right (γ1 = 3.2991)

Quantile Statistics

Minimum 0.07
5-th Percentile 0.37
Q1 0.92
Median 1.5
Q3 2.7
95-th Percentile 8.49
Maximum 31.5
Range 31.43
IQR 1.78

Descriptive Statistics

Mean 2.5019
Standard Deviation 2.9509
Variance 8.7077
Sum 9549.72
Skewness 3.2991
Kurtosis 15.1544
Coefficient of Variation 1.1795
  • price is not normally distributed (p-value 2.90267480968223e-14)
  • price has 439 outliers

price_per_sqft

numerical

Approximate Distinct Count 2731
Approximate Unique (%) 71.5%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 61072
Mean 14014.4021
Minimum 2
Maximum 600000
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • price_per_sqft is skewed right (γ1 = 11.0005)

Quantile Statistics

Minimum 2
5-th Percentile 4666
Q1 6808
Median 9016
Q3 13888
95-th Percentile 33333
Maximum 600000
Range 599998
IQR 7080

Descriptive Statistics

Mean 14014.4021
Standard Deviation 23332.3012
Variance 5.444e+08
Sum 5.3493e+07
Skewness 11.0005
Kurtosis 176.0118
Coefficient of Variation 1.6649
  • price_per_sqft is not normally distributed (p-value 3.9802063871575157e-23)
  • price_per_sqft has 368 outliers

area

numerical

Approximate Distinct Count 1351
Approximate Unique (%) 35.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 61072
Mean 4745.7047
Minimum 45
Maximum 7.25e+06
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • area is skewed right (γ1 = 58.6487)

Quantile Statistics

Minimum 45
5-th Percentile 500
Q1 1200
Median 1711
Q3 2295
95-th Percentile 4246.8
Maximum 7.25e+06
Range 7.25e+06
IQR 1095

Descriptive Statistics

Mean 4745.7047
Standard Deviation 119479.4308
Variance 1.4275e+10
Sum 1.8114e+07
Skewness 58.6487
Kurtosis 3542.7043
Coefficient of Variation 25.1763
  • area is not normally distributed (p-value 4.226633379809184e-25)
  • area has 224 outliers

areaWithType

categorical

Approximate Distinct Count 2436
Approximate Unique (%) 63.8%
Missing 0
Missing (%) 0.0%
Memory Size 437722

Length

Mean 49.677
Standard Deviation 25.8986
Median 38
Minimum 12
Maximum 108

Sample

1st row super built up are...
2nd row super built up are...
3rd row super built up are...
4th row super built up are...
5th row super built up are...

Letter

Count 85831
Lowercase Letter 85831
Space Separator 23380
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 48539
  • The largest value (area) is over 1.52 times larger than the second largest value (sq)

bedRoom

numerical

Approximate Distinct Count 21
Approximate Unique (%) 0.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 61072
Mean 3.3725
Minimum 1
Maximum 36
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • bedRoom is skewed right (γ1 = 4.7808)

Quantile Statistics

Minimum 1
5-th Percentile 2
Q1 2
Median 3
Q3 4
95-th Percentile 6
Maximum 36
Range 35
IQR 2

Descriptive Statistics

Mean 3.3725
Standard Deviation 2.0157
Variance 4.063
Sum 12873
Skewness 4.7808
Kurtosis 44.6118
Coefficient of Variation 0.5977
  • bedRoom is not normally distributed (p-value 2.038241175990573e-17)
  • bedRoom has 151 outliers

bathroom

numerical

Approximate Distinct Count 21
Approximate Unique (%) 0.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 61072
Mean 3.415
Minimum 1
Maximum 36
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • bathroom is skewed right (γ1 = 4.5387)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 2
Median 3
Q3 4
95-th Percentile 6
Maximum 36
Range 35
IQR 2

Descriptive Statistics

Mean 3.415
Standard Deviation 2.0522
Variance 4.2114
Sum 13035
Skewness 4.5387
Kurtosis 42.3201
Coefficient of Variation 0.6009
  • bathroom is not normally distributed (p-value 1.3610723789755586e-14)
  • bathroom has 129 outliers

balcony

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 253107

Length

Mean 1.3105
Standard Deviation 0.4627
Median 1
Minimum 1
Maximum 2

Sample

1st row 3
2nd row 1
3rd row 3+
4th row 2
5th row 3+

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 3817
  • The top 2 categories (3+, 3) take over 50.0%

floorNum

numerical

Approximate Distinct Count 44
Approximate Unique (%) 1.2%
Missing 21
Missing (%) 0.6%
Infinite 0
Infinite (%) 0.0%
Memory Size 60736
Mean 6.6707
Minimum 0
Maximum 51
Zeros 134
Zeros (%) 3.5%
Negatives 0
Negatives (%) 0.0%
  • floorNum is skewed right (γ1 = 1.7417)

Quantile Statistics

Minimum 0
5-th Percentile 1
Q1 2
Median 4
Q3 10
95-th Percentile 18
Maximum 51
Range 51
IQR 8

Descriptive Statistics

Mean 6.6707
Standard Deviation 6.0043
Variance 36.0512
Sum 25322
Skewness 1.7417
Kurtosis 4.7341
Coefficient of Variation 0.9001
  • floorNum is not normally distributed (p-value 3.7358615548624795e-09)
  • floorNum has 83 outliers

facing

categorical

Approximate Distinct Count 9
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory Size 281074
  • The largest value (not available) is over 1.72 times larger than the second largest value (East)

Length

Mean 8.6374
Standard Deviation 3.6554
Median 10
Minimum 4
Maximum 13

Sample

1st row North-East
2nd row North-East
3rd row North
4th row not available
5th row North

Letter

Count 30680
Lowercase Letter 26798
Space Separator 1112
Uppercase Letter 3882
Dash Punctuation 1177
Decimal Number 0
  • The largest value (available) is over 1.72 times larger than the second largest value (east)

agePossession

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory Size 299053
  • The largest value (Relatively New) is over 2.73 times larger than the second largest value (New Property)

Length

Mean 13.3477
Standard Deviation 1.9704
Median 14
Minimum 9
Maximum 18

Sample

1st row Relatively New
2nd row Relatively New
3rd row Relatively New
4th row New Property
5th row Relatively New

Letter

Count 47463
Lowercase Letter 40161
Space Separator 3485
Uppercase Letter 7302
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Relatively New, New Property) take over 50.0%

super_builtup_area

numerical

Approximate Distinct Count 589
Approximate Unique (%) 31.1%
Missing 1923
Missing (%) 50.4%
Infinite 0
Infinite (%) 0.0%
Memory Size 30304
Mean 1924.0799
Minimum 325
Maximum 9997
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • super_builtup_area is skewed right (γ1 = 1.8654)

Quantile Statistics

Minimum 325
5-th Percentile 793
Q1 1478.25
Median 1828
Q3 2215
95-th Percentile 3184
Maximum 9997
Range 9672
IQR 736.75

Descriptive Statistics

Mean 1924.0799
Standard Deviation 760.5025
Variance 578364.0799
Sum 3.6442e+06
Skewness 1.8654
Kurtosis 10.3916
Coefficient of Variation 0.3953
  • super_builtup_area is not normally distributed (p-value 2.447764502073273e-08)
  • super_builtup_area has 85 outliers

builtup_area

numerical

Approximate Distinct Count 653
Approximate Unique (%) 36.3%
Missing 2018
Missing (%) 52.9%
Infinite 0
Infinite (%) 0.0%
Memory Size 28784
Mean 7312.9423
Minimum 45
Maximum 8.7088e+06
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • builtup_area is skewed right (γ1 = 41.9441)

Quantile Statistics

Minimum 45
5-th Percentile 450
Q1 1170
Median 1700
Q3 2430
95-th Percentile 4517
Maximum 8.7088e+06
Range 8.7088e+06
IQR 1260

Descriptive Statistics

Mean 7312.9423
Standard Deviation 206016.0423
Variance 4.2443e+10
Sum 1.3156e+07
Skewness 41.9441
Kurtosis 1768.1347
Coefficient of Variation 28.1714
  • builtup_area is not normally distributed (p-value 4.226655744243881e-25)
  • builtup_area has 125 outliers

carpet_area

numerical

Approximate Distinct Count 705
Approximate Unique (%) 36.7%
Missing 1896
Missing (%) 49.7%
Infinite 0
Infinite (%) 0.0%
Memory Size 30736
Mean 2771.2244
Minimum 15
Maximum 607716
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • carpet_area is skewed right (γ1 = 23.8815)

Quantile Statistics

Minimum 15
5-th Percentile 483
Q1 900
Median 1350
Q3 1800
95-th Percentile 3155
Maximum 607716
Range 607701
IQR 900

Descriptive Statistics

Mean 2771.2244
Standard Deviation 22741.6039
Variance 5.1718e+08
Sum 5.3235e+06
Skewness 23.8815
Kurtosis 591.8569
Coefficient of Variation 8.2063
  • carpet_area is not normally distributed (p-value 4.230082756260158e-25)
  • carpet_area has 97 outliers

study room

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 251922
  • The largest value (0) is over 4.35 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 3817
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 4.35 times larger than the second largest value (1)
  • study room has words of constant length

servant room

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 251922
  • The largest value (0) is over 1.85 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 3817
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 1.85 times larger than the second largest value (1)
  • servant room has words of constant length

store room

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 251922
  • The largest value (0) is over 9.87 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 3817
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 9.87 times larger than the second largest value (1)
  • store room has words of constant length

pooja room

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 251922
  • The largest value (0) is over 4.67 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 3817
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 4.67 times larger than the second largest value (1)
  • pooja room has words of constant length

others

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 251922
  • The largest value (0) is over 8.09 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 3817
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 8.09 times larger than the second largest value (1)
  • others has words of constant length

furnishing_type

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 251922
  • The largest value (0) is over 2.35 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 0
3rd row 0
4th row 1
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 3817
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 2.35 times larger than the second largest value (1)
  • furnishing_type has words of constant length

luxury_score

numerical

Approximate Distinct Count 161
Approximate Unique (%) 4.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 61072
Mean 69.6398
Minimum 0
Maximum 174
Zeros 530
Zeros (%) 13.9%
Negatives 0
Negatives (%) 0.0%
  • luxury_score is skewed right (γ1 = 0.4894)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 28
Median 56
Q3 108
95-th Percentile 174
Maximum 174
Range 174
IQR 80

Descriptive Statistics

Mean 69.6398
Standard Deviation 53.2699
Variance 2837.6813
Sum 265815
Skewness 0.4894
Kurtosis -0.8541
Coefficient of Variation 0.7649
  • luxury_score is not normally distributed (p-value 4.491017250614045e-15)

Interactions

Correlations

Missing Values